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Abstract 

Mean Field Game systems describe equilibrium configurations in differential games with 
infinitely many infinitesimal interacting agents. We introduce a learning procedure (similar 
to the Fictitious Play) for these games and show its convergence when the Mean Field Game 
is potential. 
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1 Introduction 

Mean Field Game is a class of differential games in which each agent is infinitesimal and interacts 
with a huge population of other agents. These games have been introduced simultaneously 
by Lasry, Lions [23, 24, 25, 26] and Huang, Malhame and Caines [21], (actually a discrete 
in time version of these games were previously known under the terminology of heterogenous 
models in economics. See for instance [3]). The classical notion of solution in Mean Field Game 
(abbreviated MFG) is given by a pair of maps (u,m), where u = u{t,x) is the value function of 
a typical small player while m = m(t , x) denotes the density at time t and at position x of the 
population. The value function u satisfies a Hamilton-Jacobi equation in which m enters as a 
parameter and describes the influence of the population on the cost of each agent—, while the 
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density m evolves in time according to a Fokker-Planck equation in which u enters as a drift. 
More precisely the pair (u, to) is a solution of the MFG system , which reads 


! (i) — dt.u — crAu + H(x, Vu(t, x)) = f(x, m(t)) 

(ii) dt,m. — a Am — div(mD p H(x, Vu)) = 0 (1) 

to(0, x) = mo(x), u(T, x) = g(x, m(T)). 

In the above system, T > 0 is the horizon of the game, a is a nonnegative parameter describing 
the intensity of the (individual) noise each agent is submitted to (for simplicity we assume that 
either a — 0 (no noise) or a = 1, some individual noise). The map H is the Hamiltonian of 
the control problem (thus typically convex in the gradient variable). The running cost / and 
the terminal cost g depend on the one hand on the position of the agent and, on the other 
hand, on the population density. Note that, in order to solve the (backward) Hamilton-Jacobi 
equation (i.e., the optimal control problem of each agent) one has to know the evolution of the 
population density, while the Fokker-Planck equation depends on the optimal strategies of the 
agents (through the drift term —div(mD p H(x,X?u))). The MFG system formalizes therefore an 
equilibrium configuration. 

Under suitable assumptions recalled below, the MFG system (1) has at least one solution. 
This solution is even unique under a monotonicity condition on / and g. Under this condition, 
one can also show that it is the limit of symmetric Nash equilibria for a finite number of play¬ 
ers as the number of players tends to infinity [14]; moreover, the optimal strategy given by the 
solution of the MFG system can be implemented in the game with finitely many players to give 
an approximate Nash equilibrium [21, 15]. MFG systems have been widely used in several ar¬ 
eas ranging from engineering to economics, either under the terminology of heterogeneous agent 
model [3, 7, 22], or under the name of MFG [1, 2, 18, 20]. 

In the present paper we raise the question of the actual formation of the MFG equilibrium. 
Indeed, the game being quite involve, it is unrealistic to assume that the agents can actually 
compute the equilibrium configuration. This seems to indicate that, if the equilibrium configu¬ 
ration arises, it is because the agents have learned how to play the game. For instance, people 
driving every day from home to work are dealing with such a learning issue. Every day they try 
to forecast the traffic and choose their optimal path accordingly, minimizing the journey and/or 
the consumed fuel for instance. If their belief on the traffic turns out not to be correct, they 
update their estimation, and so on... The question is wether such a procedure leads to stability 
or not. 

The question of learning is a very classical one in game theory (see, for instance, the mono¬ 
graph [19]). There is by now a very large number of learning procedures for one-shot games in the 
literature. In the present paper we focus on a very classical and simple one: the Fictitious Play. 
The Fictitious Play was first introduced by Brown [8]. In this learning procedure, every player 
plays at each step the best response action with respect to the average of the previous actions of 
the other players. Fictitious Play does not necessarily converge, as shows the counter-example 
by Shapley [31], but it is known to converge for several classes of one shot games: for instance for 
zero-sum games (Robinson [30]), for 2x2 games (Miyasawa [27]), for potential games (Monderer 
and Shapley [29])... 

Note that, in our setting, the question of learning makes all the more sense that the game 
is particularly intricate. Our aim is to define a Fictitious Play for the MFG system and to 
prove the convergence of this procedure under suitable assumption on the coupling / and g. The 
Fictitious Play for the MFG system runs as follows: the players start with a smooth initial belief 
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(m°(£)) te [ o.t]- At the beginning of stage n+ 1, the players having observed the same past, share 
the same belief (m"(£)) te [ 0 ,T] on the evolving density of the population. They compute their 
corresponding optimal control problem with value function u n+1 accordingly. When all players 
actually implement their optimal control the population density evolves in time and the players 
observe the resulting evolution (m" +1 (£)) tg [ 0 ,T]- At the end of stage n+ 1 the players update 
their belief according to the rule (the same for all the players), which consists in computing the 
average of their observation up to time n + 1. This yields to define by induction the sequences 
u n ,m n ,m n by: 


(i) - d t u n+1 - aAu n+1 + H(x,Vu n+1 (t,x)) = f(x,m n (t)), 

(it) d t m n+1 - aAm n+1 - di v(m n+1 D p H(x, Vu n+1 )) = 0, 
?n” +1 (0) = toq, u n+1 (x,T) = g(x,fh n (T)) 


( 2 ) 


where to" = ^ J2k=i mk • Indeed, u n+1 is the value function at stage n+ 1 if the belief of players 
on the evolving density is to", and thus solves (2)-(i). The actual density then evolves according 
to the Fokker-Planck equation (2)-(ii). 

Our main result is that, under suitable assumption, this learning procedure converges, i.e., 
any cluster point of the pre-compact sequence ( u n ,m n ) is a solution of the MFG system (1) (by 
compact, we mean compact for the uniform convergence). Of course, if in addition the solution 
of the MFG system (1) is unique, then the full sequence converges. Let us recall (see [25]) that 
this uniqueness holds for instance if / and g are monotone: 

J (fix, to) — /( x, m!) d (in — m!)(x) >0, J (g(x, to) — g(x, to/) d(m — m')(x) > 0 

for any probability measure in, in'. This condition is generally interpreted as an aversion for 
congestion for the agents. Our key assumptions for the convergence result is that / and g derive 
from potentials. By this we mean that there exists F = F(in) and G = G(m) such that 


f(x,m) = ^—(x, in) 
dm 


and g(x,m) — — —(a;, to). 

dm 


The above derivative—in the space of measure—is introduced in subsection 1.2, the definition 
being borrowed from [14]. Our assumption actually ensures that our MFG system is also “a 
potential game” (in the flavor of Monderer and Shapley [28]) so that the MFG system falls into 
a framework closely related to that of Monderer and Shapley [29]. Compared to [29], however, 
we face two issues. First we have an infinite population of players and the state space and 
the actions are also infinite. Second the game has a much more involve structure than in [29]. 
In particular, the potential for our game is far from being straightforward. We consider two 
different frameworks. In the first one, the so-called second order MFG systems where a = 1— 
which corresponds to the case where the players have a dynamic perturbed by independent 
noise—the potential is defined as a map of the evolving population density. This is reminiscent 
of the variational structure for the MFG system as introduced in [25] and exploited in [10, 13] 
for instance. The proof of the convergence then strongly relies on the regularity properties of the 
value function and of the population density (i.e., of the u n and to"). The second framework is for 
first order MFG systems, where a — 0. In contrast with the previous case, the lack of regularity 
of the value function and of the population density prevent to define the same Fictitious Play 
and the same potential. To overcome the difficulty, we lift the problem to the space of curves, 
which is the natural space of strategies. We define the Fictitious Play and a potential in this 
setting, and then prove the convergence, first for the infinite population and then for a large, but 
finite, one. 
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As far as we are aware of, our paper is the first one to consider a learning procedure in the 
framework of mean field games. Let us nevertheless point out that, for a particular class of MFG 
systems (quadratic Hamiltonians, local coupling), Gueant introduces in [17] an algorithm which 
is closely related to a replicator dynamics: namely it is exactly (2) in which one replaces m n by 
rn n in (2)-(i)). The convergence is proved by using a kind of monotonicity of the sequence. This 
monotonicity does not hold in the more intricate framework considered here. 

For simplicity we work in the periodic setting: we assume that the maps H , / and g are 
periodic in the space variable (and thus actually defined on the torus T d = R d /Z d ). This sim¬ 
plifies the estimates and the notation. However we do not think that the result changes in a 
substantial way if the state space is R d or a subdomain of R d , with suitable boundary conditions. 

The paper is organized as follows: we complete the introduction by fixing the main notation 
and stating the basic assumptions on the data. Then we define the notion of potential MFG and 
characterize the conditions of deriving from a potential. Section 2 is devoted to the Fictitious 
Play for second order MFG systems while section 3 deals with the first order ones. 

Acknowledgement: The first author was partially supported by the ANR (Agence Na¬ 
tional de la Recherche) projects ANR-10-BLAN 0112, ANR-12-BS01-0008-01 and ANR-14- 
ACHN-0030-01. 

1.1 Preliminaries and Assumptions 

If A is a metric space, we denote by V{X) the set of Borel probability measures on X. When 
X = T d (T d being the torus R d /Z d ), we endow T(T d ) with the distance 

di(g,v) = swp l [ h(x) d(g - v)(x) l p,v£V(T d ), (3) 

h {JTd ) 

where the supremum is taken over all the maps h : T d —> R which are 1-Lipschitz continuous. 
Then di metricizes the weak-* convergence of measures on T d . 

The maps H , / and g are periodic in the space arguments: H : T d x R d — > R while /, g : 
T“ x V(T d ) —> R. In the same way, the initial condition mo € V(T d ) is periodic in space and is 
assumed to be absolutely continuous with a smooth density. 

We now state our key assumptions on the data: these conditions are valid throughout the 
paper. On the initial measure mo, we assume that 

mo has a smooth density (again denoted mo). (4) 

Concerning the Hamiltonian, we suppose that H is of class C 2 on T d x R d and quadratic-like in 
the second variable: 

H£C 2 (T d xR d ) and i/ d < D 2 pp H(x,p) <CI d V{x,p) e T d x R d . (5) 

o 

Moreover, we suppose that D X H satisfies the lower bound: 

{D x H(x,p),p) > -C (|p| 2 + l). (6) 

The maps / and g are supposed to be globally Lipschitz continuous (in both variables) and 
regularizing: 

The map m —> is Lipschitz continuous from V(T d ) to C 2 (T d ) , . 

while the map m —> g(-,m) is Lipschitz continuous from P(T d ) to C 3 (T d ). 
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In particular, there is C > 0 such that 

sup \\f(-,m)\\ c2 + \\g{-,m)\\ c3 <C. (8) 

meP(T d ) 

Assumptions (4), (5), (6), (7), (9) are in force throughout the paper. As explained below, they 
ensure the MFG system to have at least one solution. 

To ensure the uniqueness of the solution, we sometime require / and g to be monotone: for 
any to, ml £ V(T d ), 

/ — — cc) > 0, / {g(x, m) — g(x, m'))d(m — m')(x) > 0. (9) 

JT d JT d 

This condition can be interpreted as a dislike of congested area by the agent. 


1.2 Potential Mean Field Games 

In this section we introduce the main structure condition on the data / and g of the game: we 
assume that / and g are the derivative, with respect to the measure, of potential maps F and 
G. In this case we say that / and g derive from a potential. 

Let us first explain what we mean by a derivative with respect to a measure. Let F : V(T d ) —> 
R be a continuous map. We say that the continuous map : T d x V(fT d ) —>• R is the derivative 
of F if, for any m,m’ £ V(T d ), 


lim 

s—>-0 


F((l — s)m + sm') — F(m) 


f SF 

= / ——(to, x)d(m' — m){x). (10) 

J T d dm 


As is continuous, this equality can be equivalently written as 

n S F 

-—((1 — s)m + sm'), x)d(m' — m)(x)ds, 
y dm 

for any to, m' £ V(T d ). Note that is defined only up to an additive constant. To fix the ideas 
we assume therefore that 


We often use the notation 
SF_ 
dm 


r Sjp 

/ -—(to, x)dm(x) = 0 Vm £ V(T d ). 

Jjd 0771 

f SF 

(to)(to/ — to) := / -— (x,m)d(m' — m){x). 

Jfd dm 


Definition 1.1. A Mean Field Game is called a Potential Mean Field Game if the instantaneous 
and final cost functions f, g : T d x T(T d ) —> R derive from potentials, i.e., there exists F,G : 
V{T d ) —> R such that 

dm ’ dm ^ 

In the rest of the section we characterize the maps f which derive from a potential. Although 
this is not used in the rest of the paper, this characterization is natural and we believe that it 
has its own interest. 


To proceed we assume for the rest of the section that, for any x £ T d , f(x, ■) has a derivative 
and that this derivative : T d x V(T d ) x T d —» R is continuous. 
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Proposition 1.2. The map f : T d x V(T d ) — >• R derives from a potential, if and only if, 

Y~{y,m,x) = Y~(x,m,y) Vx,y G T d , Vm € P(T d ). 
dm dm 

Proof. First assume that / derives from a potential F : V(T d ) —> R. Deriving in m the relation 
= f we obtain 

dm J 

Y~^(m,x,y) = p~(x,m,y) Vx,y G T d , m G V(T d ). 

om z dm 

As x, y) is symmetric in (x,y) (see [14]), so is JA(a :,m,y). 

Let us now assume that 4£-(x,m,y) is symmetric in ( x,y ). Let us fix mo G V(T d ) and set, 
for any m G V(T d ), 


( 11 ) 


F(m) = / / f(x, (1 — t)mo + tm)d(m — mo)(a;)dt. 

Jo J T d 

We claim that F is a potential for /. Indeed, as / has a continuous derivative, so has F, with 
SF r ^j 

7— (m, y) = t T~( X J (1 — + tm, y) d(m — m 0 )(x)dt 

om J 0 J T d dm 

+ / f(y, (1 - f)m 0 + tm)dt. 

Jo 

As, by symmetry assumption, 

-j:.f(y,(l-t)mo+tm) = / -r^(y, (1 - i)m 0 + tm, x)d{m - m 0 )(x) 

at Jjd dm 

f ^ f 

= / -r~(x, (1 - t)m 0 +tm,y)d(m - m 0 )(x), 

J T d om 

we have therefore after integration by parts in (11), 

SF r i 1 

— (m,y)= t f(x, (1 - t)m 0 + tm) = f(x,m). 
dm L J o 


□ 


2 The Fictitious Play for second order MFG systems 

In this section, we study a learning procedure for the second order MFG system: 

! (i) — dtu — A u + H ( x , Vu(t, x)) = /( x, m(t)), ( t , x) G [0, T] x T d 

(ii) dtm — Am — di v(mD p H(,x , Vm)) = 0, (f, a;) G [0, T] x T d (12) 

m(0) = mo, u(x, T) = g{x, m(T)), iGTf 

Let us recall (see [25]) that, under our assumptions (4), (5), (6), (7), there exists at least one 
classical solution to (12) (i.e., for which all the involved derivative exists and are continuous). If 
furthermore (9) holds, then the solution is unique. 
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2.1 The learning rule and the convergence result 

The Fictitious Play can be written as follows: given a smooth initial guess m° £ C°([0, T],V(T d )), 
we define by induction sequences u n , m n : [0, T] x T d —> R by: 

( (i) -d t u n+1 -Au n+1 + H{x,Vu n+1 (t,x)) = f{x,m n (t)), (t, x) £ [0, T] x T d 

l (ii) d t m n+1 - A m n+1 - di v(m n+1 D p H(x, Vu n+1 )) = 0, (t, x) £ [0, T] x T d (13) 

[ to" +1 (0) = Too, u n+1 (x, T) = g(x, m"(T)), x £ T d 

where fh n (t, x) = T 5Zfe=i mfc (£> &)• The interpretation is that, at the beginning of stage n+1, the 
players have the same belief of the future density of the population (m"(f)) tg [o^l an d compute 
their corresponding optimal control problem with value function u n+1 . Their optimal (closed- 
loop) control is then (t,x) —>• —D p H(x,Vu n+1 (t,x)). When all players actually implement this 
control the population density evolves in time according to (13)-(ii). We assume that the players 
observe the resulting evolution of the population density (m n+1 (t)) te [ 0 , T }. At the end of stage 
n + 1 the players update their guess by computing the average of their observation up to time 
n + 1. 

In order to show the convergence of the Fictitious Play, we assume that the MFG is potential, 
i.e. there are potential functions F,G : V(T d ) —» R such that 

S F SG 

f(x,m) = -— (m,x) and g(x,m) = -— (m,x). (14) 

dm din 

We also assume that mo is smooth and positive. 

Theorem 2.1. Under the assumptions (4), (5), (6), (7) and (14), the family {{u n , m n )} ra <=N is 
uniformly continuous and any cluster point is a solution to the second order MFG (12). 

If, in addiction, the monotonicity condition (9) holds, then the whole sequence {(u n , TO n )} ra€ N 
converges to the unique solution of (12). 

The key remark to prove Theorem 2.1 is that the game itself has a potential. Given m £ 
C°([0, T] x T d ) and w £ C°([0,T] x T d ) such that, in the sense of distribution, 

d t m — A to + div(rc) = 0 in (0, T ) x T d m( 0) = Too, 


let 


$(m,w) = ( f m(t,x)H*(x,—w(t,x)/m(t,x))dxdt+ f F(m(t))dt + G(m(T)), 

Jo J T d Jo 

where H* is the convex conjugate of H: 

H*(x,q) = sup {p,q) - H(x,p). 

p£R d 

In the definition of 4>, we set by convention when m = 0, 

, . [ 0 if w = 0 

H (*>-«V™) = { +oo otherwise. 

For sake of simplicity, we often drop the integration and the variable (f, x) to write the potential 
in a shorter form: 

$(m,tn) = f f mH*(x,—w/m)+ f F(m(t))dt + G(m(T)). 

Jo J T d Jo 
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It is explained in [25] section 2.6 that ( u,m ) is a solution to (12) if and only if ( m,w ) is a 
minimizer of $ and w = -mD p H(-,S7u). We show here that the same map can be used as a 
potential in the Fictitious Play: $ (almost) decreases at each step of the Fictitious Play and the 
derivative of $ does not vary too much at each step. Then the proof of [29] applies. 


2.2 Proof of the convergence 

Before starting the proof of Theorem 2.1, let us fix some notations. First we set 

1 " 

w n (t,x) = —m n (t,x)D p H(x,'Vu n (t,x)) and w n (t,x ) = — ’S^w k (t,x). 

n 

fc=l 

Since the Fokker-Planck equation is linear we have : 

d t m n+1 - A m n+1 + div(«) n+1 ) = 0, t £ [0,T], m n+1 (0) = m 0 . 

Recall that H* is the convex conjugate of H: 

H*(x,q) = sup (j>,q) - H(x,p). 

peR d 

We define p(x, q) as the minimum in the above right-hand side: 

H*(x,p) = {p(x, q), q) - H(x,p(x, q)). 


(15) 


(16) 


(17) 


Note that p is characterized by q = D p H(x,p(x,q)). The uniqueness comes from the fact that 
H satisfies D pp H > jjld, which yields that D p H(x, •) is one-to-one. We note for later use that 

mH*[x, ——) = sup —{p,q)—mH(x,p). 

m peR d 

Next we state a standard result on uniformly convex functions, the proof of which is postponed: 
Lemma 2.2. Under assumption (5), we have for any x £ T d , p, q € R d : 

H(x,p) + H*(x,q) - (p,q) > i \q - D p H(x,p)\ 2 

The following Lemma explains that $ is “almost decreasing” along the sequence ( m n ,w n ). 
Lemma 2.3. There exists a constant C > 0 such that, for any n £ N*, 

${m n+1 ,w n+1 ) - $(m"»") < + § 

rT r 


C n n 2 


(18) 


where a n = 


m 


n+1 'w n+1 /fh n+1 -w n+1 /rri 


n + 1 /m n +i I 


JO J T d 

Throughout the proofs, C denotes a constant which depends on the data of the problem only 
(i.e., on H , /, g and mo) and might change from line to line. We systematically use the fact 
that, as / and g admit F and G as a potential and are globally Lipschitz continuous, there exists 
a constant C > 0 such that, for any m,m’ £ V(T d ) and s £ [0,1], 


F(m + s(m/ — m)) — F{m ) —si f(x, m)d(m/ — m)(x) 

JT d 

G(m + s{m’ — in)) — G(m) —s g(x,m)d(m' — m)(x) 

JT d 


< C\s\ 2 , 


<C|s| 2 . 



Proof of Lemma 2.3. We have 


$(m n+1 , w n+1 ) = $(ro", u> n ) + A + B, 


where 


A= f [ m n+1 H*(-w n+1 /ffi n+1 )-m n H*{-w n /fh n ) 

Jo J T d 

B= f (F(fh n+1 (t)) - F{fh n {t)))dt+ (G(fh n+1 {T)) - G{fh n (T))). 
Jo 

Since F is C 1 with respect to m with derivative /, we have 


[ [ f(x,m n (t))(m n+1 -m n ) + [ g(x,m n {T)){ 

Jo J T d J T d 


”(T))(m" +1 -m n ) + 4. 

n z 


As m n+1 — m n = --(?n ra+1 — m n+1 ), we find after rearranging: 


-j - [ [ f(x, m"(f))(m" +1 —m n+ 1 )H—-J— f g(x,m n (T))(m n+1 (T)-m n+1 (T))+^. 
1 J 0 J T d n + 1 rU 


Using now the equation satisfied by u n+1 we get 


— f f (-d t u n+1 - A u n+l +H(x, Vu n+1 )) ( m n ~ 
+ 1 Jo J T d 

H- [ g(x,fh n (T))(m n+1 (T) - fh n+1 (T)) + 

n + 1 J T d 


n+1 -n+l\ 


m — m 


<-- / / (<9 t (rn" +1 - to" + 1 ) - A(rn n+1 - m n+1 ))u n+1 

n + 1 J 0 Jjd. 

+ —[ [ H{x, Vu n+1 )(m n+1 - fh n+1 ) + 

n + 1 Jo Jr d n 

where we have integrated by parts in the second inequality. Using now the equation satisfied by 
m n+1 — fh n+1 and integrating again by parts, we obtain 

B<—?—[ [ (w n+1 -w n+ \Vu n+1 )+H{x,Vu n+1 )(m n+1 -fh n+1 ) + ^. 

n + 1 Jo J T d n z 

Note that by Lemma 2.2, 

~(w n+1 ,Vu n+1 ) - H(x, Vu" +1 )m n+1 < m n+1 H*{-w n+1 /fh n+1 ) 

- -Lfh n+1 \w n+1 /m n+1 - w n+1 /m n+1 1 2 

2 C ' ' ' 1 

while, by the definition of w n+1 , 

{w n+ \Vu n+1 )+H(x,Vu n+1 )m n+1 = —m n+1 H*(—w n+1 /m n+1 ). 


Therefore 


-f [ m n+1 H*{-w n+1 /m n+1 )-m n+1 H*{-w n+1 /m n+1 ) 
1 Jo Jr d 


fh n+1 \w n+1 /m n+1 -w n+1 /m n+1 \ 
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On the other hand, recalling the definition of p in (17) and setting p n+1 = p(-, — w n+1 /m n+1 ), 
we can estimate A as follows: 

A< [ [ -(p n+1 ,w n+1 )-m n+1 H{x,p n+1 ) + {p n+1 ,w n ) + fh n H{x,p n+1 ) 

Jo J T d 

= —!— [ T [ (p n+1 ,w n+1 )+m n+1 H(x,p n+1 )-(v n+1 ,w n+1 )-m n+1 H(x,p n+1 ) (22) 

n + lJo Jt-! 

< - 2 — [ f m n+1 H* (—w n+1 /m n+1 ) — m n+1 H* (—w n+1 /fh n+1 ). 

n+1 Jo Jrd 

Putting together (21) and (22) we find: 

<f>(m n+1 ,w n+1 ) - $(m", w n ) < — %■ — + 

2 C n n z 


where a n = [ [ m n+1 \w n+1 /m n+1 - w n+1 /m n+1 \ 2 . □ 

Jo J T d 

In order to proceed, let us recall some basic estimates on the system (13), the proof of which 
is postponed: 

Lemma 2.4. For any a £ (0, 1/2) there exist a constant C > 0 such that for any n £ N* 

||u n \\c 1 +a/ 2 , 2 +a + || Wl” || Cl+a/2,2+a 5= C, TO™ > 1/C, 

where C 1+a / 2,2+a i s th e usual Holder space on [0,T] x T d . 

As a consequence, the u n , the m n and the w n do not vary too much between two consecutive 
steps: 

Lemma 2.5. There exists a constant C > 0 such that 

|| u n+1 - wloo + ||V U n+1 - WHoo + ||m n+1 - m n \\oo + |K +1 - < -. 

n 

Proof. As fh n — to” -1 — ((n— 1 )m n_1 +m n )/n , where the to™ (and thus the fh n ) are uniformly 
bounded thanks to Lemma 2.4, we have by Lipscliitz continuity of / and g that 

sup ||/(•,TO™ +1 (^)) — f(-,m n (t))\\ + \\g(-,m n+1 (T)) — g(-,m n (T))\\ < (23) 

te[o,T] n 

Thus, by comparison for the solution of the Hamilton-Jacobi equation, we get 

||w" +1 - W ”|U<-. (24) 

n 

Let us set 2 := u n+1 — u n . Then z satisfies 


—dtz — Az + H(x, Va" + Vz) — H(x, Vu") = f(x,m n (t)) — f(x,m n 1 (t)). 
Multiplying by z and integrating over [0, T] x T d we find by (23) and (24): 

[ + [ [ \Vz\ 2 +z{H(x,\7u n + Vz)-H{x,Vu n ))<^. 

Jf d 2 J 0 Jo Jjd n 
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Then we use the uniform bound on the Vu" given by Lemma 2.4 as well as (24) to get 



C.„ C 

— |Vz|) < -j. 


Thus 



|V 2 | 2 < 


c 


which implies that ||Vz||oo < C/n since ||V 2 z|| 00 + Hc^Vzjloo < C by Lemma 2.4. 
We argue in a similar way for := m n+1 — m n : fj, satisfies 


dtn — A fi — div(^D p H(x, Du n+1 )) — div(i?) = 0, 

where we have set R = m n ( D p H(x , Vu" +1 ) — D p H(x, Vu")). As ||i?||oo < C/n by the previous 
step, we get the bound on ||m n+1 — m n \\ 00 < C/n by standard parabolic estimates. This implies 
the bound on ||r<;" +1 — to”!^ by the definition of the w n . □ 

Combining Lemma 2.4 with Lemma 2.5 we immediately obtain that the sequence (a n ) defined 
in Lemma 2.3 is slowly varying in time: 

Corollary 2.6. There exists a constant C > 0 such that, for any n £ N*, 

C 

\a n+ i - a n | < —. 

n 

Proof of Theorem 2.1. From Lemma 2.3, we have for any n G N*, 

$(m n+1 ,u) n+1 )-$(m n ,tS n ) <-i— + ^ 

C n n z 

where a n = f [ m n+t \w n+1 / m n+1 - w n+1 / m n+1 1 2 . 

Jo J T d _ 

Since the potential $ is bounded from below the above inequality implies that 


^ a n /n < +oo. 

n> 1 


From Corollary 2.6, we also have, for any n G N*, 

C 

|^n+l n n \ + . 

n 

Then Lemma 2.7 below implies that liuin^oo a n = 0. 

In particular we have, by Lemma 2.4: 

lim [ f \w n /fh n -w n /m n \ 2 <C lim [ [ m n \w n /rh n — w n /m n \ 2 = 0. 

n->°o J Q J T d n—>oo J q J Td 

This implies that the sequence {w n /fh n — w n /m n } ne jq —which is uniformly continuous from 
Lemma 2.4- uniformly converges to 0 on [0,T] x T d . 

Recall that, by Lemma 2.4, the sequence {(« n+1 , to", m n , w n )} n e N is pre-compact for the uni¬ 
form convergence. Let ( u,m,fh,w ) be a cluster point of the sequence {(u n+1 ,m n ,m n ,w n )} n ^. 
Our aim is to show that ( u,m ) is a solution to the MFG system (12), that fh = m and that 
w = -mD p H(-, Vw). 
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Let rii £ N,* £ N be a subsequence such that ( u ni+1 ,m ni ,m ni ,w ni ) uniformly converges to 
(■ u,m,fh,w ). By the estimates in Lemma 2.4, we have D p H(x,'S7u nj ) converges uniformly to 
D p H(x,Vu), so that by (15) and the fact that the sequence {w n /m n — w n converges 
to 0, 

w w 

—D p H(x. Vit) = — = —. (25) 

m m 

We now pass to the limit in (13) (in the viscosity sense for the Hamilton-Jacobi equation and in 
the sense of distribution for the Fokker-Planck equation) to get 

( i ) — d t u — Au + H(x, S7u(t, x)) = f(x,rh(t)), (t, x) £ [0, T] x T d 

( ii ) dtm. — Am — di v(mD p H(x, Vu)) = 0, (t, x) £ [0,T] x T d (26) 

m(0) = mo, u(x, T) = g(x, fh(T)), x £ T d . 


Letting n —>• +oo in (16) we also have 


d t m — Am + div(ru) = 0, t£[0,T], m(0) = mo- 


By (25), this means that m and m are both solutions to the same Fokker-Planck equation. Thus 
they are equal and (u, m) is a solution to the MFG system. 

If (9) holds, then the MFG system has a unique solution (u, m), so that the compact sequence 
{(u n ,m n )} has a unique accumulation point (u,m) and thus converges to ( u,m ). □ 


In the proof of Theorem 2.1, we have used the following Lemma, which can be found in [29]. 

Lemma 2.7. Consider a sequence of positive real numbers {a„} ne N such that a n/n < +oo. 

Then we have 

1 N 

lim — > a n = 0 . 

N—>oo N ^ 

n= 1 

In addition, if there is a constant C > 0 such that \ a n — a„+i| < y- then lim „_ i . 00 a n = 0 

Proof. We reproduce the proof of [29] for the sake of completeness. For every k £ N define 
bk = a n/n- Since a„/n < +oo we have lim fc _ i . 00 = 0. So we have: 


N 


lim 

N—>oo N 


^2b k =0, 


k= l 


which yields the first result since: 

N N 

y On < Y,b k . 

n= 1 k =1 

For the second result, consider e > 0. We know that for every A > 0 we have: 


lim 

N—> oo 


i 

N 


1 

N + 1 


1 

[(i+ W\ 


log(l + A), 


where [a] denotes the integer part of the real number a. So if A e > 0 is so small that log(l + A e ) < 
- 5 ^, then there exist N e £ N so large that for N > N e we have 


11 1 _ e 

N + N + i +'"+ [(l + X e )N] < 2C 


(27) 
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Let N > N e . Assume for a while that ajv > e. As |ofe + i — a k \ < C/k, (27) implies that a k > § 
for N < k < [IV (1 + A e )]. Thus 


1 

[iV(l + A c )] 


[JV(l+A e )] 

T. CL k > 

fc= 1 


X e e 
1 + A e 2 


Since the average N 1 Ylk= l converges to zero, the above inequality cannot hold for N large 
enough. This implies that ajv < £ for N sufficiently large, so that (a k ) converges to 0. □ 


Proof of Lemma 2.2. For simplicity of notation,we omit the x dependence in the various quan¬ 
tities. As by assumption (5) we have ^Id < D pp H < Cld, H* is differentiable with respect to q 
and the following inequality holds: for any qi,q 2 £ K d , 

(D q H*(q i) - D q H*(q 2 ),q 1 - q 2 ) > ^\qi - q 2 \ 2 ■ 

Let us fix p, q E and let q € be the maximum in 

ma x(q',p) - H*{q’) = H(p). 

q'eR d 

Recall that p = D q H*{q) and thus q = D p H(p). Then 
H(p) + H*(q) - {p, q) = H*(q) - H*(q) - (q - q,p) 

= [ (D q H*((l -t)q + tq) - DgH*(q),q - q)dt 

Jo i 

= [ - t)q + tq ) - D q H*(q), ((1 - t)q + tq) - q)dt 

Jo 1 

-Jo ~ ^ = 2 ~ q \ 2 ’ 

□ 


Proof of Lemma 2-4- Given rh n £ C'°([0, T] 1 V(T d )), the solution u n+1 is uniformly Lipschitz 
continuous. Hence any weak solution to the Fokker-Planck equation is uniformly Holder contin¬ 
uous in C°([0, T], V(ff d )). This shows that the right-hand side of the Hamilton-Jacobi equation 
is uniformly Holder continuous; then the Scliauder estimate provide the bound in C' 1 +«/ 2 > 2 +« 
for a £ (0,1/2). Plugging this estimate into the Fokker-Planck equation and using again the 
Schauder estimates gives the bounds in C 1+a / 2,2+a on the the m n . The bound from below for 
the to" comes from the strong maximum principle. □ 


3 The Fictitious Play for first order MFG systems 

We now consider the first order order MFG system: 

( (*) — d t u + H(x,’Vu(t,x)) = f(x,m(t)), (t, x) £ [0, T] x T d 

< (ii) d t m + div(— mD p P[(x, Vu(t, x))) = 0, (f, x) £ [0,T] x T d (28) 

[ m(0) = too, u(x,T) = g(x,m(T)), x £ T d 

In contrast with second order MFG systems, we cannot expect the existence of classical solutions: 
namely both the Hamilton-Jacobi equation and the Fokker-Planck equation have to be under¬ 
stood in a generalized sense. In particular, the solutions of the Fictitious Play are not smooth 
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enough to justify the various computations of section 2. For this reason we introduce another 
method based on another potential which also has the interest that it can be adapted to a 
finite population of players. 

Let us start by recalling the notion of solution for (28). Following [25], we say that the pair 
(u, to) is a solution to the MFG system (28) if u is a Lipscliitz continuous viscosity solution to 
(28)-(i) while to G L°°((0,T) x T d ) is a solution of (28)-(ii) in the sense of distribution. 

Under our standing assumptions (4), (5), ( 6 ), (7), there exists at least one solution (u, to) to 
the mean field game system (28). If furthermore (9) holds, then the solution is unique (see [25] 
and Theorem 5.1 in [11]). 


3.1 The learning rule and the potential 

The learning rule is basically the same as for second order MFG systems: given a smooth initial 
guess m° : [0, T] xT d —» R, we define by induction sequences u n , m n : [0, T] x T d —> R heuristically 
given by: 


(i) — d t u n+1 + H(x, Vu n+1 (t, x)) = /(x, m n (t)), (t, x) G [0, T] x T d 

(ii) d t m n+1 + div(—m n+1 D p H (x, Vu n+1 )) = 0, (t, x) G [0, T] x T d (29) 

m n+1 (0) = too, u n+1 (x,T) = g(x,fh n (T)), x G T d 

where fh n (t, x) = - Y^k=i mk (t, x). If equation (29)-(i) is easy to interpret, the meaning of (29)- 
(ii) would be more challenging and, actually, would make little sense for a finite population. For 
this reason we are going to rewrite the problem in a completely different way, as a problem on 
the space of curves. 

Let us fix the notation. Let T = C°([0, T\, T d ) be the set of curves. It is endowed with usual 
topology of the uniform convergence and we denote by B(T) the associated a— field. We define 
■p(r) as the set of Borel probability measures on £>(r). We view T and 'P(T) as the set of pure 
and mixed strategies for the players. For any t € [0,T] the evaluation map e* : T —» T d , defined 
by: 

VyeT 

is continuous and thus measurable. For any g £ P(T) we define m ri (t) = etjjr; as the push forward 
of the measure 77 to T d i.e. 

mP{t){A) = r /({7 G T | 7 (t) G d}) 

for any measurable set A C T d . We denote by PoT) the set of probability measures on T such 
that eo[t»? = too- Note that Uo(r) is the set of strategies compatible with the initial density too- 
Given an initial time t G [0, T] and an initial position x, it is convenient to define the cost of a 
path 7 G C°([f,T],T d ) payed by a small player starting from that position when the repartition 
of strategies of the other players is 7 . It is given by 


7 , 77 ) 


J L('y(s), 7 (s)) + /( 7 (s), ?n I '(s))ds + g{^(T) 
+00 




if 7 G iL 1 ([t,T], T d ) 
otherwise. 


where L(x,v) := H*(x,—v) and H* is the Fenchel conjugate of H with respect to the last 
variable. If t = 0, we simply abbreviate J(x, 7 , 77 ) := J(0, x, 7 , 77 ). We note for later use that 
J(t, x, -, rf) is lower semi-continuous on F. 

We now define the Fictitious Play. We start with an initial configuration 77 ° G Uo(r) (the 
belief before the first step of a typical player on the actions of the other players). We now build 
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by induction the sequences ( 9 n ) and ( r ] n ) of PIT), r] n being interpreted as the belief at the end 
of stage n of a typical player on the actions of the other agents and 9 n+1 the repartition of 
strategies of the players when they play optimally in the game against rj n . More precisely, for 
any x G T d , let 7” +1 G iJ 1 ([0, T], T d ) be an optimal solution to 


inf 

7£if 1 , 7(0)=# 




In view of our coercivity assumptions on H and the definition of L , the optimum is known to 
exist. Moreover, by the measurable selection theorem we can (and will) assume that the map 
x -+ 7™ +1 is Borel measurable. We then consider the measure 9 n+1 G Po(r) defined by 

6,"+i := 7" + i#m 0 ViG[0,T] 


and set 


n+1 


,™+i . = _L_ V' gk = n + _J_(0n+1 _ 

n + 1 ^ n-H ' 


k =1 


n +1 


(30) 


As in section 2, we assume that our MFG is potential, i.e., that there exists of potential 
functions F, G : V(T d ) -+ R such that: 


f(x, m) = —(x,m), g(x, m) = —(x,m). 


(31) 


Here is our main convergence result. 

Theorem 3.1. Assume that (4), (5), (6), (7) and (31) hold. Then the sequences ( ri n ,9 n ) is 
pre-compact in V(T) x V(T) and any cluster point ( fj , 0) satisfies the following: 9 = fj and, if we 
set 

m(t) := e t $fj, u(t,x) = inf J{t,x,-y,fj), (32) 

-yGH 1 , 7(i)=x 

then the pair ( u,m ) is a solution to the MFG system (28). 

If furthermore (9) holds, then the entire sequence ( rj n ,9 n ) converges. 

The proof of Theorem 3.1 is postponed to the next subsection. As for the second order 
problem, the key idea is that our MFG system has a potential. However, in contrast with the 
second order case, the potential is now written on the space of probability on curves and reads, 
for r\ G "P(r), 


®{v) '■= 


l ( 7 ( 0 > 7 (*)) dtdrify) 


r Jo 


F(e t it,r)) dt + G(e T tf??). 


(33) 


Note that &(r]) is well-defined and belongs to (—oo,+oo]. The potential defined above is remi¬ 
niscent of [12] or [10]. For instance, in [10] —but for MFG system with a local dependence and 
under the nronotonicity condition (9)- it is proved that the MFG equilibrium can be found as 
a global minimum of $. We will show in the proof of Theorem 3.1 that the limit measure fj is 
characterized by the optimality condition 

<S4> M> 

r ®(ii)< r (ii)(9) WGP(r). 

dm dm 

Before proving that $ is a potential for the game, let us start with preliminary remarks. The 
first one explains that the optimal curves are uniformly Lipschitz continuous. 


15 



(34) 


Lemma 3.2. There exists a constant C > 0 such that, for any x G T d and any n > 0, 

Halloo <C. 

In particular, the sequences (rf 1 ) and ( 9 n ) are tight and 

d Mv n+1 ,ef^ n+1 ) < C\t-t'\ Vi,i' G [0, T], 

Proof. Under our assumption on H , / and g , it is known that the ( u n ) are uniformly Lipschitz 
continuous (see, for instance, the appendix of [11]). As a byproduct the optimal solutions are 
also uniformly Lipschitz continuous thanks to the classical link between the derivative of the 
value function and the optimal trajectories (Theorem 6.4.8 of [9]): this is (34). The rest of the 
proof is a straightforward consequence of (34). □ 

Next we compute the derivative of $ with respect to the measure 77 . Let us point out that, 
since $ is not continuous and can take the value +oo, the derivative, although defined by the 
formula (10), has to be taken only at points and direction along which 4> is finite. This is in 
particular the case for the rf 1 and the 9 n . 

Lemma 3.3. For any r\,rf G 'P(T) such that $( 77 ), $( 77 ') < + 00 , we have 

( 5 $ r 

— (?7)(?7'- 77 ) = y J(7(°),7,»7) d (v 

Proof. This is a straightforward application of the definition of <f> in (33) and of the continuous 
derivability of F and G. □ 

By abuse of notation, we also define ( 77 ) ( 9 ) for a positive Borel measure 9 on T by setting 

= ^- r (7(0),7^)d6'(7)- 

Note that, as J is bounded below, the quantity ^( 17 )(9) is well-defined and belongs to (— 00 , + 00 ]. 
Next we translate the optimality property of 7 ” to an optimality property of rf 1 . 

Lemma 3.4. For any nGN*, 

^(7?")(6> n+1 ) = [ J(x,^ +1 ,rj n )m 0 (x)dx = min ^( 77 n )(9). 
o ?7 J t<* oePo(r) 0 77 

Proof. The first equality is just the definition of 9 n+1 . It remains to check that, for any 9 G 'Po(r), 


/ J(x,^ +1 ,t] n )m 0 {x)dx< f J( 7 ( 0 ), 7 ,? 7 ”)d 6 >( 7 ). 

J T d Jr 

As 777-0 = eojj0, we can disintegrate 8 into 9 = f Jd 8 x dmo(x), where 9 X G ^(T) with 7 ( 0 ) = x for 
9 X — a.e. 7 . By optimality of 7” +1 we have, for mo—a.e. x G T d , 

JM + 1 ,77")<^J0r,7U?”) d8 x ( 1 ) 

and therefore, integrating with respect to mo'. 

[ J(x,^ +1 ,rj n )m 0 (x)dx < [ [ J(x,^y,r] n ) d9 x ('y)m 0 (x)dx = [ 7 , ??”)d 6 »( 7 ). 

JT d J T d Jr Jr 

□ 
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The next proposition states that the potential $ is indeed almost decreasing along the se¬ 
quence ( 77 "). 

Proposition 3.5. There is a constant C > 0 such that, for any nfN*, we have 

W +1 ) < W) + ^ + ( 35 ) 

n + 1 dr] (n + l) z 

where 

|V)( 0 n+1 - r f) = J J( 7(0),7, V n ) d(r +1 - 77 ")( 7 ) < 0 . (36) 

Proof. Recalling (30), we have 

$(? 7 n+1 ) - $(??”) = [ ^?((1 - s)if + sr] n+1 )(r] n+1 - r] n )ds 

J ° 6V t (37) 

= r 4 rr / ^((i-s)77” + S 7 ? ” + 1 )(r + 1 -7 ? ")d S . 

(?Z + 1 ) J o 077 


Let us estimate the right-hand side of the inequality. For any s £ [0,1], Lemma 3.3 states that 
^((1 - s) V n + sr] n+1 )(9 n+1 -r, n ) = j J( 7 ( 0 ), 7 , (1 - s)rf + s V n+1 ))d(e n+1 - 17") ( 7 ) 

= ^ J( 7 (0), 7 , T 7 n )d(r +1 - 77 ") ( 7 ) + R(s) 
where, by the dehnition of J and Lipschitz continuity of / and < 7 , 


(38) 


#(«) = / / (/(7(*), e*tt((l - 5)77" + S7? n+1 )) - /(7(t), e t tt7?”))dM(6» n+1 - 77”)(7) 

./r -'o 

+ ertt((l - s)??” + st?" + 1 )) - e T tt 7 ?"))d( 6>" +1 - 77 ")( 7 ) ( 39 ) 

< C sup di (e t j)((l - s)? 7 n+1 + srf 1 )), e t %r\ n ) . 
te[o ,T] 


Note that, by the dehnition of di, we have for any t £ [0,T], 
di(e t (t((l - s)? 7" +1 + sif)), e t tt? 7 n ) 

<sup [ f{x)d{e4((l - s)rT +1 + srT)(x) - [ £(x) d(e t $r] n )(x) 

f JT d JT d 

< (1 - s)sup [ £,(x) d(e t $r] n+1 )(x) - f £(x) d^rf 1 )^) 

£ J T d JT d 

< ^ ^ sup f £(x) d(e t tt( 6»” +1 — rj n ))(x) 

77+1 £ JT d 

< T sup f _ £(°)) d( e ttt 6 '" +1 - e t #7? n )(a;) < —, 

77+1 J J T d 77 + 1 


where the supremum is taken over the set of Lipschitz maps £ : T d —> R with Lipschitz constant 
not larger than 1. Therefore 

$(77" +1 ) - $(,7-) < ■^y J J( 7 ( 0 ), 7 , 77”) d(r +1 - 77”) ( 7 ) + , 

where the hrst term in the right-hand side is nonpositive thanks to Lemma 3.4. □ 
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3.2 Convergence of the Fictitious Play 

In this subsection, we prove Theorem 3.1. Recall that Lemma 3.2 states that the sequence (p n ) 
is tight. We next Lemma characterizes the cluster distribution : 

Lemma 3.6. Any cluster point fj of the sequence (rf 1 ) satisfies 

6 -^( v )( p ) < w e p 0 (r), (40) 

orj op 

which means that fj—a.e. 7 is optimal for the map 7 —> J( 7 ( 0 ), 7 , 77 ) under the constraint 7 ( 0 ) = 
7 ( 0 ). 

Proof. Let us define: 

a n+1 := - d -^(v n W +1 -p n ) = -J^ J( 7 ( 0 ), 7 , p n )d(9 n+1 - p n ) 

= [ J(l(d)^,p n )dp n (^) - min f J( 7 ( 0 ), 7 , J?")d 0 ( 7 ), 

Jr eev 0 (T d ) J T 

where the last equality come from Lemma 3.4. Then according to Proposition 3.5 the sequence 
(a") is non-negative and, by (35), the quantity Y2k ak /k * s finite (because is bounded below). 
Therefore by Lemma 2.7 we have: 

1 N 

lim —Ya k = 0. (41) 

N —>-+oo N ‘ 

k =1 

Let us now check that a n < C/n for some constant C. By arguments similar to the ones in the 
proof of Proposition 3.5, we have, for any 0 G Po(r), 

(42) 

On the other hand, by optimality of 9 n+1 and 9 n+2 in Lemma 3.4 and (42), we have 

fV)(0" +1 ) = min / J( 7 (O), 7 ,rf)d 0 ( 7 )< / J( 7 (0), 7 ) p n )^ n+ \l) 
dr/ eev 0 (J d)J r J T 

< [ J ( 7 (0), 7 , p n+ 1 )d9 n+2 ( 7 ) + Cfn = S -^( r] n + 1 )(9 n+2 ) + C/n 

Jr $P 

= min / >^( 7 ( 0 ) 1 7; p n+ 1 )d9('y) + C/n 
9ev 0 (T d )Jr 

< j Jh(0),rp n+ 1 )d9 n+ 1 h) + C/n= ^fa")(0" +1 ) + Cyn, 


which proves that 


So we have: 


5 f-(p n )(9 n+l ) - 5 f~(p n+1 )(9 n+2 ) < C/n. 
orj or] 


\a n - a n+1 | = °—(p n )(p n ~ 0 n+1 ) - ^(p n+1 )(p n+1 ~ 0 n+2 ) 

11 op op 

< ff(9 n )(9 n ) - 6 -^(p n+1 )(p n+1 ) + 5 ff(p n W +1 ) 6 -^(v n+1 )(0 n+2 ) 

op or] or] op 

< 6 -^(p n )(9 n ~P n+1 ) +C/n=^ n; ~f~j(p n )(9 n+1 — p n ) + C/n < C/n. 
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By (41) and the above estimate, we conclude that a n —» 0 thanks to Lemma 2.7. 

Let now fj be any cluster point of the sequence (r? n ). Let us check that (40) holds. Let 
9 € Vo(T d ). Then, from Lemma 3.4, for every n€Nwe have: 

< 5 $ < 5 < 4 > ,! ( 54 > 

-an = -H V n W +1 ) < 
orj orj orj 

If ( rj ni )i e n is such that rj ni —>• fj, then: 


VyST: |J( 7 , 7 ( 0 ),r?) - J( 7 , 7 ( 0 ), 77 ni )| < A" sup d^e^rf 11 , e t fo), 

te[o,T] 


where the last term tends to 0 because the maps t —> are uniformly continuous (from 

Lemma 3.2) and converges pointwisely (and thus uniformly) to t —> e^fj. This yields that 

(— (i] ni )(9)) converges to —(ry)(0). On the other hand, by lower semicontinuity of the map 
or/ Or] 

7 —> J( 7 , 7 ( 0 ), 77 ) on T, we have 


^ (v)(v) < lim inf Jfj(v)(v ni ) = Ihninf ^{rj ni )(j] ni ), 


which proves (40). 

Let us check that fj—a.e. 7 is optimal for the map 7 —> J( 7 ( 0 ), 7 , 77 ) under the constraint 
7 ( 0 ) = 7 ( 0 ). Let 9 = f Td Sj x mo(x)dx where % is (a measurable selection of) an optimal solution 
for 7 —»• J(cc, 7 ,^) under the constraint 7 ( 0 ) = x. If we disintegrate fj into fj = f Jd fj x mo(x)dx, 
then, for toq— a.e. x and fj x —a.e. 7 we have 


J(x, 1x,v) < J(x,T,v)- 


(43) 


Integrating over r/ x and then against mo then implies that 

= J Td J{x,lx,rj)m 0 (x)dx < J^Jh(0),j,r])dr](j) = — ( 77 X 77 ). 

As the reverse inequality always holds, this proves that there must be an equality in (43) a.e., 
which proves the claim. □ 

Proof of Theorem 3.1. Let ( 77 ,0) be the limit of a converging subsequence ( rj ni ,9 ni ). We set 

u(t,x) := inf J(t,x, / y,fj) and fh(t) := e^fj. 

7er, 7 {t)=x 

By standard argument in optimal control, we know that u is a viscosity solution to (28)-(i) with 
terminal condition u(T,x) = g(x,fh{T)). Moreover, u is Lipschitz continuous and semiconcave 
(cf. for instance Lemma 5.2 in [11]). 

It remains to check that fh satisfies (28)-(ii). By Lemma 3.6, we know that 

^(»?)(»?) < ^(»?)(0) V0 e p 0 (t), 

which means that fj— a.e. 7 is optimal for the map 7 —> J (7 (0), 7 , fj) under the constraint 
7(0) = 7(0). Following Theorem 6.4.9 in [9], the optimal solution for J{x , •,fj ) is unique at any 
point of differentiability of u(0, •) (let us call it 77 ). Disintegrating fj into fj = f Jd fj x dmo{x), we 
have therefore, since mo is absolutely continuous, 

fjx = cty* for mo—a.e. x £ T d , 


19 



so that 

fj = 5*/ x mo(x)dx and fh(t) = 7 . (t)t)mo VtG[0,T]. (44) 

J T d 

Let us also recall that the derivative of u(t, •) exists along the optimal solution 7 ^ and that 

%(t) = -D p H(%(t),Vu(t,%(t)) Vt G (0, T] 

(see Theorems 6.4.7 and 6.4.8 of [9]). This proves that m is a solution in the sense of distribution 
of (28)-(ii) (where we denote by Vw any fixed Borel measurable selection of the map (t,x) -+ 
D*u(t,x), the set of reachable gradients of u at (t,x), see [9]). Proposition A.l in appendix 
states that (28)-(ii) has a unique solution and that this solution has a density in L°°: thus fh is 
in L°°, which shows that the pair ( u,m ) is a solution of the MFG system (28). 

In order to identify the cluster point 9, let us recall that 9 n is defined by 

e n = 7)>io, 

where, for any x G T d , 7 " is a minimum of J(x,-,rj n ) under the constraint 7(0) = x. As the 
criterion J(x,-, i] rH ) T—converges to fj) and since at any point of differentiability of 11(0, •) 

the optimal solution 7 x is unique, standard compactness arguments show that ( 7 ™*) converges 
to 7 a, for a.e. iGlf Therefore ( 0 ni ) converges to 7 .jjmo, which is nothing but fj by (44). So we 
conclude that 9 = fj. 

Finally, if (9) holds, then we claim that fj is independent of the chosen subsequence. Indeed, 
since from its very definition the dependence with respect to fj of J(x, j,fj) is only through the 
family of measures (m(t) = e^fj) and since, by (9), there exists a unique solution to the MFG 
system and thus fh is uniquely defined, J(x, 7 , fj) is independent of the choice of the subsequence. 
Then 73 , defined above is also independent of the subsequence, which characterizes fj in a unique 
way thanks to (44). Therefore the entire sequence (r] n ,9 n ) converges to ( fj,fj ). □ 

Remark 3.7. The proof shows that a measure fj G 'Po(r) which satisfies (40) can be understood 
as the representation of a MFG equilibrium. Indeed, if we define ( u,fh) as in (32), then ( u,fh ) is 
a solution to the MFG system (28). Conversely, if (u,fh) is a solution to the MFG system (28) , 
then the relation (44) identifies uniquely a measure fj G 'Po(r). For this reason, we call such a 
measure an equilibrium measure. 


3.3 The Learning Procedure in iV-Players games 


In this part we show that the Fictitious Play in the Mean Field Game with large (but finite) 
number of players N G N converges in some sense to the equilibrium of our Mean Field Game with 
infinite number of players. For every N G N, fix a sequence of initial states a;^, , • • • , x^ G T d 

such that: 

lim di(m^,mo) = 0 


1 


N 


where 5 x n is the empirical measure associated with the {xf f }i=i ! ... t N. 


i+l 


case of an infinite population, let us define the sequences rj n ’ N ,9 n ' N g V{T), for n G 
following way: 


As in the 
N* in the 


n+l,N = - ,gl,N + 02,JV + . . . + gn+1 ,N, 

n + 1 v ’ 

011+1 ,N _ — ((5 n+i,JV + d n+l,JV + • • • + 5 n+l,jv) 

A ~_JV +_JV ~.JV 

X 1 x 2 x N 


(45) 
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where %t 1,N is an optimal path which minimizes J(xf , -,r] n ' N ). As before one can show that if 

a n+1 ’ N := -^( v n ’ N )( e n+1 ’ N - V n ’ N ) = - / J( 7 (0),7,7 7 Il ’ JV )d(r +1 ’ JV - ^’ JV )( 7 ) 

°V Jr 


= J r J (7(°)-7,??"’ iV ) d ??"’ iV (7) - 


eep(r),eott0=m^ J r 


^( 7 ( 0 )) 7) ? ? n ’ JV )d6 l (7) ) 


then we have limn-^ a n,iV = 0. This proves that any accumulation distribution rj N of the 


sequence {rf n,N } n eN* satisfies: 


J J( 7(0),7,?7 Ar )cV v (7) = 


mm / 
0G7 :> (r),eot}0=m^ Jp 


J(7(0),7,?7 A ')cW(7). 


(46) 


So if ^ = + + • ’ • + »&) then 

supp(^) C argmin 7(0)=X4 

Note that, in contrast with the case of an infinite population, this is not an equilibrium condition, 
since the deviation of a player changes the measure fj N as well. 

In the following Theorem we prove that any accumulation point fj of {iyjv} satisfies: 


[ J( 7(0),7,?7)d??(7) = min f J('y(Q),'y,rj)d9('y), (47) 

Jr 9er 0 (r) J r 

where "Po(r) is the set of measure 6 G V{T) such that eg$6 = mo- We have seen in Remark 3.7 
that this condition characterizes an MFG equilibrium. 

Theorem 3.8. Assume that (4), (5), (6), (7) and (31) hold. Consider the Fictitious Play for the 
N—player game as described in (45) and let fj N by an accumulation distribution of (rj n ' N ) n &N- 
Then every accumulation point of pre-compact set of {fj N }N^ is an MFG equilibrium. 

If furthermore the monotonicity condition (9) holds, then ( fj N ) has a limit which is the MFG 
equilibrium. 

Proof. Consider fj as an accumulation point of the set {fj N }Nen- It is sufficient to show that for 
every 6 G V(T) such that eoj \9 = mg, we have 


J J( 7(0),7,f?)dr?(7) < ^(7(°),7,^)d6'(7). (48) 

Since mg is absolutely continuous with respect to the Lebesgue measure, there exists an optimal 
transport map tn ■ —> T d such that: 

T N $m 0 =mg, di (mg,mg)= / \x - Tjv(a;)|drno(2;) 

J T d 


(see [6]). We define the functions : T —>• T as follows: 

6v(7) = 7 - 7(0) + tjv(7(0)) 
and set 9 N = £,nW- Then we have 

eoP N = eoi(^jvi^) = (eo ° 6v )W = ('Hv 0 e 0 )\\9 = Tjvtt(e o |t0) = T N $mg = nig . 
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Then the characterization (46) of r/ N yields: 

J^Jh{0),'y,fj N )dfj N {'y) < J J( 7 ( 0 ), 7 ,? 7 Ar )d 6 » Ar ( 7 ). (49) 

By lower semicontinuity of J we have 

J J( 7(0),7,??)d?7(7) < J J (7(0),7,?7 Ar )d7 JV (7)- 

On the other hand, by the definition of f N and 6 N and the decomposition 9 = f Jd 9 x mo(x)dx, 
we have 

J J(l(.0),'y,fj N )d9 N ('y) 

= [ [([ -^/(0)+T N (^(0)),etlfj N ) dt 

JT d Jr Jo 

+g(n{t) - 7(0) + Tat ( 7 ( 0 )) > e T ^l N ))m 0 (x)d9 x {'y)dx, 

where, by dominate convergence, the right-hand side converges to the right-hand side of (48). 
So letting TV —» 00 in (49) gives exactly (48). 

Under (9), the MFG equilibrium is unique. Hence, for any e > 0 there exists N e £ N such 
that for any N > N e and any accumulation point fj N we have di ( 77 , fj N ) < e. □ 

Corollary 3.9. Assume (4), (5), ( 6 ), (7) and (31) and (9). Then, for any e > 0 there is N e £ N 
such that for any N > N e , 

3n(N, e) £ N : Vn > n(N, e) : di(r/ n ’ N , fj) < e, 

where fj is the MFG equilibrium. In other words, for every e > 0, one can reach to the 
e—neighborhood of the equilibrium point if the number of players N is large enough. 

A Well-posedness of a continuity equation 

We consider the continuity equation 

dtm — di v(mD p H(x, Vu)) = 0 
m(0, x) = too(x). 

where u, is the viscosity solution to 

— d t u + H{x, Vu(t, x)) = f(x, fh(t)), 
u[T,x) = g(x,m(T)), xGT d 

Let us recall that u is semi-concave. In (50) we denote by Vu any fixed Borel measurable selection 
of the map (t,x) —f D*u(t,x) (the set of reachable gradients of u at (t,x), see [9]). The section 
is devoted to the proof of the following statement. 

Proposition A.l. There exists a unique solution m of (50) in the sense of distribution. More¬ 
over fh is absolutely continuous and satisfies 

sup 11 ffl {t, ') 11 oo — C. 
te[o,T] 


in (0, T) x T d 

(t,x) G [0, T] x T d 
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The difficulty for the proof comes from the fact that the vector field — D p H(t, x, Vu) is 
not smooth: it is even discontinuous in general. The analysis of transport equations with non 
smooth vector fields has attracted a lot of attention since the DiPerna-Lions seminal paper [16]. 
We face here a simple situation where the vector field generates almost everywhere a unique 
solution. Nevertheless uniqueness of solution of the associated continuity equation requires the 
combination of several arguments. We rely here on Ambrosio’s approach [4, 5], in particular for 
the “superposition principle” (see Theorem A.3 below). 

Let us start with the existence of a bounded solution to (50): this is the easy part. 

Lemma A.2. There exists a solution to (50) which belongs to L°°. 

Proof. We follow (at least partially) the perturbation argument given in the proof of Theorem 
5.1 of [11]. For £ > 0, let ( u e ,m e ) be the unique classical solution to 

( —d t u e — £Au e + H(x,Vu e ) = f(x,frift)) in(0,T)xT d 
< d t m e — sAm e — di v(m e D p H(x, Vu £ )) =0 in (0,T) x T d 
[ m £ (0,x) = mo(x), u e (T,x) = g(x,m(t)) in T d 

Following the same argument as in [11], we know that the (m e ) are uniformly bounded in L°°: 
there exists C > 0 such that 

||ur||oo < C Ve > 0. 

Moreover (by semi-concavity) the (Vu e ) are uniformly bounded and converge a.e. to Vu as £ 
tends to 0. Letting £ —>• 0, we can extract a subsequence such that nf converges in L°°—weak* 
to a solution m of (50). □ 

The difficult part of the proof of Proposition A.l is to check that the solution to (50) is unique. 
Let us first point out some basic properties of the solution u: we already explained that u is 
Lipschitz continuous and semiconcave in space for any t , with a modulus bounded independently 
of t. We will repetitively use the fact that u can be represented as the value function of a problem 
of calculus of variation: 

u(t,x) = inf [ L(s,'Y(s),j(s),rh(s))ds + g('y(T)) (51) 

7 , -y(t)=xj t 

where we have set, for simplicity of notation, 

L(s, x, v ) = L(x, v ) + f(x, to(s)), g(x) = g(x, m(T)). 

For (t, x) £ [0, T) x T d we denote by Aft, x) the set of optimal trajectories for the control problem 

(Si). 

We need to analyze precisely the connexion between the differentiability of u with respect to 
the x variable and the uniqueness of the minimizer in (51) (see [9], Theorems 6.4.7 and 6.4.9 and 
Corollary 6.4.10). Let (t, x) £ [0, T] x T d and 7 € T. Then 

1. (Uniqueness of the optimal control along optimal trajectories) Assume that 7 £ A(t,x). 
Then, for any s £ ( t,T ], u(s, •) is differentiable at y(s) for s £ ( t,T ) and one has j(s) = 
~D p H(j(s), Vu(s, 7 (s))). 

2 . (Uniqueness of the optimal trajectories) Vu(t,x) exists if and only if A{t,x) is a reduced 
to singleton. In this case, 7 ft) = —D p H(x,'Vuft,x)) where A(t,x) = { 7 }. 
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3. (Optimal synthesis) conversely, if y(-) is an absolutely continuous solution of the differential 
equation 

/ 7 (s) = -D p ff(s, 7 (s), Vu(s, 7 (s))) a.e. in [t,T] 

I 7 (t) = x, {52) 

then the trajectory 7 is optimal for u(t,x). In particular, if u{t,-) is differentiable at x, 
then equation (52) has a unique solution, corresponding to the optimal trajectory. 

The next ingredient is Ambrosio’s superposition principle, which says that any weak solution 
to the transport equation 

d t p - di v(pD p H{x, Vu )) = 0 in (0, T) x T d (53) 


can be represented by a measure on the space of trajectories of the ODE 


70) = -D p H('y(s), Vu(s, 7 (s)). (54) 

Theorem A.3 (Ambrosio superposition principle). Let p be a solution to (53). Then there 
exists a Borel probability measure rj on C°([0, T], T d ) such that p{t) = etjjry for any t and, for 
77 —a.e. 7 G C°([0, T], T d ), 7 is a solution to the ODE (54). 

See, for instance, Theorem 8.2.1. from [ 6 ]. 


We are now ready to prove the uniqueness part of the result: 

Proof of Proposition A.l. Let p be a solution of the transport equation (53). From Ambro¬ 
sio superposition principle, there exists a Borel probability measure 77 on C°([0, T], T d ) such 
that p{t) = e t fi 7 for any t and, for 77 —a.e. 7 G C°([0,T],T d ), 7 is a solution to the ODE 
7 = —D p ff(f, 7 (f), Vit(f, 7 (f))). As mo = eo(try, we can disintegrate the measure 77 into 77 = 
f T d rfxdmo(x), where 7(0) = x for ?j x — a.e. 7 and mo—a.e. x G T d . Since mo is absolutely 
continuous, for mo—a.e. x G T d , rj x — a.e. map 7 is a solution to the ODE starting from x. By 
the optimal synthesis explained above, such a solution 7 is optimal for the calculus of variation 
problem (51). As, moreover, for a.e. x G T d the solution of this problem is reduced to a singleton 
{lx}, w e can conclude that d^^) = 5^ x for m 0 —a.e. x G T d . Hence, for any continuous map 
(f> : T d — > M, one has 

/ (j>{x)m{t,x)) dx = / 4>( ; y x (t))mo{x)dx 
J T d J T d 

which defines p in a unique way. □ 
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